Topic-based web site summarization

نویسندگان

  • Yongzheng Zhang
  • Evangelos E. Milios
  • A. Nur Zincir-Heywood
چکیده

Purpose Summarization of an entire Web site with diverse content may lead to a summary heavily biased towards the site’s dominant topics. This paper presents a novel topic-based framework to address this problem. Design/methodology/approach A two-stage framework is proposed. The first stage identifies the main topics covered in a Web site via clustering and the second stage summarizes each topic separately. The proposed system is evaluated by a user study, and compared with the single-topic summarization approach. Findings The user study demonstrates that the clustering-summarization approach statistically significantly outperforms the plain summarization approach in the multi-topic Web site summarization task. Text-based clustering based on selecting features with high variance over Web pages is reliable, Outgoing links are useful if a rich set of cross links is available. Research limitations/implications More sophisticated clustering methods than those used in this study are worth investigating. The proposed method should be tested on web content that is less structured than organizational web sites, for example blogs. Practical implications The proposed summarization framework can be applied to the effective organization of search engine results and faceted or topical browsing of large Web sites. Originality/value Several key components are integrated for web site summarization for the first time, including feature selection and link analysis, key phrase and key sentence extraction. Insight into the contributions of links and content to topic-based summarization was gained. A classification approach is used to minimize the number of parameters. Article Type: Research paper

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Framework for Summarization of Multi-topic Web Sites

Web site summarization, which identifies the essential content covered in a given Web site, plays an important role in Web information management. However, straightforward summarization of an entire Web site with diverse content may lead to a summary heavily biased to the dominant topics covered in the target Web site. In this paper, we propose a two-stage framework for effective summarization ...

متن کامل

EXTRACTION-BASED TEXT SUMMARIZATION USING FUZZY ANALYSIS

Due to the explosive growth of the world-wide web, automatictext summarization has become an essential tool for web users. In this paperwe present a novel approach for creating text summaries. Using fuzzy logicand word-net, our model extracts the most relevant sentences from an originaldocument. The approach utilizes fuzzy measures and inference on theextracted textual information from the docu...

متن کامل

A Comparative Study on Key Phrase Extraction Methods in Automatic Web Site Summarization

Web Site Summarization is the process of automatically generating a concise and informative summary for a given Web site. It has gained more and more attention in recent years as effective summarization could lead to enhanced Web information retrieval systems such as searching for Web sites. Extraction-based approaches to Web site summarization rely on the extraction of the most significant sen...

متن کامل

A Comparison of Word- and Term-based Methods for Automatic Web Site Summarization

Automatic Web site summarization is an effective means of making the content of a web site easily accessible to Web users. We demonstrate that a content-based approach to summarization, which is based on keyword and key sentence extraction from narrative text, is able to generate summaries that are as informative as human authored summaries. This work is directed towards summary generation base...

متن کامل

Text Summarization Using Cuckoo Search Optimization Algorithm

Today, with rapid growth of the World Wide Web and creation of Internet sites and online text resources, text summarization issue is highly attended by various researchers. Extractive-based text summarization is an important summarization method which is included of selecting the top representative sentences from the input document. When, we are facing into large data volume documents, the extr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IJWIS

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2010